Learning the Taxonomy of Function Words for Parsing
نویسندگان
چکیده
Completely data-driven grammar training is prone to over-fitting. Human-defined word class knowledge is useful to address this issue. However, the manual word class taxonomy may be unreliable and irrational for statistical natural language processing, aside from its insufficient linguistic phenomena coverage and domain adaptivity. In this paper, a formalized representation of function word subcategorization is developed for parsing in an automatic manner. The function word classification representing intrinsic features of syntactic usages is used to supervise the grammar induction, and the structure of the taxonomy is learned simultaneously. The grammar learning process is no longer a unilaterally supervised training by hierarchical knowledge, but an interactive process between the knowledge structure learning and the grammar training. The established taxonomy implies the stochastic significance of the diversified syntactic features. The experiments on both Penn Chinese Treebank and Tsinghua Treebank show that the proposed method improves parsing performance by 1.6% and 7.6% respectively over the baseline.
منابع مشابه
برچسبزنی خودکار نقشهای معنایی در جملات فارسی به کمک درختهای وابستگی
Automatic identification of words with semantic roles (such as Agent, Patient, Source, etc.) in sentences and attaching correct semantic roles to them, may lead to improvement in many natural language processing tasks including information extraction, question answering, text summarization and machine translation. Semantic role labeling systems usually take advantage of syntactic parsing and th...
متن کاملOn the Representation of Bloom's Revised Taxonomy in Interchange Coursebooks
This study intends to evaluate Interchange series (2005), which are still fundamental coursebooks in the EFL curriculum settings, in terms of learning objectives in Bloom’s Revised Taxonomy (2001) to see which levels of Bloom's Revised Taxonomy were more emphasized in these coursebooks. For this purpose, the contents of Interchange textbooks were codified based on a coding scheme designed by th...
متن کاملCubic-time Parsing and Learning Algorithms for Grammatical Bigram Models
This technical report presents a probabilistic model of English grammar that is based upon “grammatical bigrams”, i.e., syntactic relationships between pairs of words. Because of its simplicity, the grammatical bigram model admits cubic-time parsing and unsupervised learning algorithms, which are described in detail.
متن کاملAn improved joint model: POS tagging and dependency parsing
Dependency parsing is a way of syntactic parsing and a natural language that automatically analyzes the dependency structure of sentences, and the input for each sentence creates a dependency graph. Part-Of-Speech (POS) tagging is a prerequisite for dependency parsing. Generally, dependency parsers do the POS tagging task along with dependency parsing in a pipeline mode. Unfortunately, in pipel...
متن کاملOn the Applicability of Oxford's Taxonomy of Learner Strategies to Translation Tasks
During the last three decades, especially 1980's, language learning specialists have been busy discovering the nature of language learning strategies, describing them, and formulating their relationships with other language learning factors. In line with these studies, the field of translation studies has undergone a complete revolution in terms of its perspective toward its research prioritie...
متن کامل